336 8.2  Molecular Simulation Methods

there are inevitably bottlenecks due to one parallelized component of the simulation

having to wait for the output from another before it can proceed, though of course

multiple different simulations can be run simultaneously at least, for example, for

simulations that are stochastic in nature and so are replicated independently several

times or trying multiple different starting conditions for deterministic simulations.

3 Supercomputing resources have improved enormously, with dedicated clusters of

ultrafast multiple-​core CPUs coupled with locally dedicated ultrahigh bandwidth

networks. The use of academic research supercomputers is in general extended to

several users where supercomputing time is allocated to enable batch calculations in

a block and can be distributed to different computer nodes in the supercomputing

cluster.

4 Recent developments in GPU technology have revolutionized molecular simulations.

Although the primary function of a GPU is to assist with the rendering of graphics

and visual effects so that the CPU of a computer does not have to, a modern GPU

has many features that are attractive to brute number-​crunching tasks, including

molecular simulations. In essence, CPUs are designed to be flexible in performing

several different types of tasks, for example, involving communicating with other

systems in a computer, whereas GPUs have more limited scope but can perform

basic numerical calculations very quickly. A programmable GPU contains several

dedicated multiple-​core processors well suited to Monte Carlo methods and MD

simulations with a computational power far in excess of a typical CPU. Depending

on the design, a CPU core can execute up to 8× 32 bit instructions per clock cycle

(i.e., 256 bit per clock cycle), whereas a fast GPU used for 3D video-​gaming purposes

can execute ~3200× 32 bit instructions per clock, a bandwidth speed difference of a

factor of ~400. A very-​high-​end CPU of, for example, having ~12 cores, has a higher

clock rate of up to 2–​3 GHz versus 0.7–​0.8 GHz for GPUs, but even comparing coup­

ling together four such 12-​core CPUs, a single reasonable gaming GPU is faster by at

least a factor of 5 and, at the time of writing, cheaper by a factor of at least an order

of magnitude. GPUs can now be programmed relatively easily to perform molecular

simulations, outperforming more typical multicore CPUs by a speed factor of ~100.

GPUs have now also been incorporated into supercomputing clusters. For example,

the Blue Waters supercomputer at the University of Urbana-​Champaign is, as I write,

the fastest supercomputer on any university campus and indeed one of the fastest

supercomputers in the world, which can use four coupled GPUs that have performed

a VMD calculation of the electrostatic potential for one frame of a MD simulation of

the ribosome (an enormously complex biological machine containing over 100,000

atoms with a large length scale of a few tens of nm; see Chapter 2) in just 529 s using

just one of these available GPUs, as opposed to ~5.2 h using on a single ultrafast

CPU core.

The key advantage with GPUs is that they currently offer better performance per

dollar than several of high-​end CPU core applied together in a supercomputer, either

over a distributed computer network or clustered together in the same machine. A GPU

can be installed on an existing computer and may enable larger calculations for less

money than building a cluster of computers. However, several supercomputing clusters

have GPU nodes now. One caveat is that GPUs do not necessarily offer good perform­

ance on any arbitrary computational task, and writing code for a GPU can still present

issues with efficient memory use.

One should also be mindful of the size of the computational problem and whether a super­

computer is needed at all. Supercomputers should really be used for very large jobs that no

other machine can take on and not be used to make a small job run a bit more quickly. If you

are running a job that does not require several CPU cores, you should really use a smaller

computer; otherwise, you would just be hogging resources that would be better spent on

something else. This idea is the same for all parallel computing, not just for problems in

molecular simulation.